40 research outputs found

    Document Filtering for Long-tail Entities

    Full text link
    Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on and are also trained on the specifics of differentiating features for each specific entity. Moreover, these approaches tend to use so-called extrinsic information such as Wikipedia page views and related entities which is typically only available only for popular head entities. Entity-dependent approaches based on such signals are therefore ill-suited as filtering methods for long-tail entities. In this paper we propose a document filtering method for long-tail entities that is entity-independent and thus also generalizes to unseen or rarely seen entities. It is based on intrinsic features, i.e., features that are derived from the documents in which the entities are mentioned. We propose a set of features that capture informativeness, entity-saliency, and timeliness. In particular, we introduce features based on entity aspect similarities, relation patterns, and temporal expressions and combine these with standard features for document filtering. Experiments following the TREC KBA 2014 setup on a publicly available dataset show that our model is able to improve the filtering performance for long-tail entities over several baselines. Results of applying the model to unseen entities are promising, indicating that the model is able to learn the general characteristics of a vital document. The overall performance across all entities---i.e., not just long-tail entities---improves upon the state-of-the-art without depending on any entity-specific training data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 201

    Towards a framework for critical citizenship education

    Get PDF
    Increasingly countries around the world are promoting forms of "critical" citizenship in the planned curricula of schools. However, the intended meaning behind this term varies markedly and can range from a set of creative and technical skills under the label "critical thinking" to a desire to encourage engagement, action and political emancipation, often labelled "critical pedagogy". This paper distinguishes these manifestations of the "critical" and, based on an analysis of the prevailing models of critical pedagogy and citizenship education, develops a conceptual framework for analysing and comparing the nature of critical citizenship

    Non-invasive beam profile monitor for medical accelerators

    Get PDF
    A beam profile monitor based on a supersonic gas-curtain is currently under development for transverse profile diagnostics of electron and proton beams in the High Luminosity LHC. This monitor uses a thin supersonic gas curtain that crosses the primary beam to be characterized under an angle of 45 degrees. The fluorescence caused by the interaction between the beam and gas-curtain is detected using a specially designed imaging system to determine the 2D transverse profile of the primary beam. Another prototype monitor based on beam induced ionization is installed at The Cockcroft Institute. This paper presents the design features of both the monitors, the gas-jet curtain formation and various experimental tests, including profile measurements of an electron beam, using helium, nitrogen and neon as gases. Such a non-invasive online beam profile monitor would be highly desirable also for medical LINAC’s and storage rings as it can characterize the beam without stopping machine operation. The paper discusses opportunities for simplifying the monitor design for integration into a medical accelerator and expected monitor performance

    Mining clinical relationships from patient narratives

    Get PDF
    Background The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships. Results We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships. Conclusion We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types

    Reflections on the 'History and Historians' of the black woman's role in the community of slaves: enslaved women and intimate partner sexual violence

    Get PDF
    Taking as points of inspiration Peter Parish’s 1989 book, Slavery: History and Historians, and Angela Davis’s seminal 1971 article, “Reflections on the black woman’s role in the community of slaves,” this probes both historiographically and methodologically some of the challenges faced by historians writing about the lives of enslaved women through a case study of intimate partner violence among enslaved people in the antebellum South. Because rape and sexual assault have been defined in the past as non-consensual sexual acts supported by surviving legal evidence (generally testimony from court trials), it is hard for historians to research rape and sexual violence under slavery (especially marital rape) as there was no legal standing for the rape of enslaved women or the rape of any woman within marriage. This article suggests enslaved women recognized that black men could both be perpetrators of sexual violence and simultaneously be victims of the system of slavery. It also argues women stoically tolerated being forced into intimate relationships, sometimes even staying with “husbands” imposed upon them after emancipation

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Speaker Model and Decision Threshold Updating in Speaker Verification

    No full text

    Extracting Emerging Knowledge from Social Media

    No full text
    Massive data integration technologies have been recently used to produce very large ontologies. However, knowledge in the world continuously evolves, and ontologies are largely incomplete for what concerns low-frequency data, belonging to the so-called long tail. Socially produced content is an excellent source for discovering emerging knowledge: it is huge, and immediately reflects the relevant changes which hide emerging entities. Thus, we propose a method for discovering emerging entities by extracting them from social content. Once instrumented by experts through very simple initialization, the method is capable of finding emerging entities; we use a purely syntactic method as a baseline, and we propose several semantics-based variants. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors, built by using terms occurring in their social content, and then ranks the candidates by using their distance from the centroid of seeds, returning the top candidates as result. The method can be continuously or periodically iterated, using the results as new seeds. We validate our method by applying it to a set of diverse domain-specific application scenarios, spanning fashion, literature, and exhibitions
    corecore